{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Further Hypothesis Testing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select this cell and type Ctrl-Enter to execute the code below.\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from scipy import stats\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "data = pd.read_csv(\"stars.csv\")\n",
    "type_key = ['Brown Dwarf', 'Red Dwarf', 'White Dwarf', 'Main Sequence', 'Supergiant','Hypergiant']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7 - Correcting for multiple hypothesis tests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unfortunately, there is a problem with the previous analysis.\n",
    "\n",
    "Recall that the significance level, $\\alpha$, is defined as the probability of incorrectly rejecting $H_0$ when it is actually true (i.e. the probability of a Type I error).\n",
    "\n",
    "When we perform [*multiple related hypothesis tests*](https://en.wikipedia.org/wiki/Multiple_comparisons_problem), we increase the chances of producing such a Type I error.\n",
    "\n",
    "For example, if $\\alpha=0.05$ and we perform 100 tests, we *expect* to generate 5 Type I errors. This can be a serious problem when large numbers of hypothesis tests are carried out simultaneously, for example in screening thousands of genes for association with a disease.\n",
    "\n",
    "We therefore need a strategy to control the rate of Type I errors. A very simple approach is given by the [*Bonferroni correction*](https://en.wikipedia.org/wiki/Bonferroni_correction):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bonferroni correction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Theory\n",
    "\n",
    "When conducting $n$ related hypothesis tests, we reduce the significance level for each test to $\\alpha/n$.\n",
    "\n",
    "The probability of making a Type I error *over the whole set of tests* (known as the *family-wise error rate*, FWER) therefore remains at $\\alpha$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Application"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Shapiro-Wilk test for normality\\n\")\n",
    "\n",
    "p_values = []\n",
    "alpha = 0.05\n",
    "n = 6\n",
    "\n",
    "for t in range(0,n):\n",
    "    sample = data[data.type == t].temperature.apply(np.log)\n",
    "    p_values.append(stats.shapiro(sample)[1])\n",
    "\n",
    "print(\"with uncorrected alpha =\",np.format_float_scientific(alpha,3),\":\")\n",
    "for i in range(0,n):\n",
    "    result = \"\"\n",
    "    if(p_values[i] < alpha): result = \"*** REJECT H0 ***\"\n",
    "    print(i, type_key[i], \": p =\", np.format_float_scientific(p_values[i],3), result) \n",
    "\n",
    "\n",
    "    \n",
    "print(\"\\nwith Bonferroni correction, alpha/n =\",np.format_float_scientific(alpha/n,3),\":\")\n",
    "for i in range(0,n):\n",
    "    result = \"\"\n",
    "    if(p_values[i] < alpha/n): result = \"*** REJECT H0 ***\"\n",
    "    print(i, type_key[i], \": p =\", np.format_float_scientific(p_values[i],3), result) \n",
    "\n",
    "\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "After correcting for multiple hypothesis testing, the red dwarf p-value is not significant.\n",
    "\n",
    "We should report to Professor Xu that log(temperature) is not normally distributed for the brown dwarf, supergiant and hypergiant types.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Alternative methods for multiple testing correction\n",
    "\n",
    "The Bonferroni correction is simple to apply, but it may be too conservative when there is a very large numbers of tests, or when the tests are not independent (for example, genes are often related to other genes so are likely to share properties).\n",
    "\n",
    "The [*Benjamini-Hochberg procedure*](https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini–Hochberg_procedure) is an alternative approach. Instead of controlling the FWER, this method controls the *proportion of the positive tests that are incorrect*, i.e. the proportion of rejected $H_0$'s that are Type I errors. This is known as the *false-discovery rate*, FDR.\n",
    "\n",
    "<br>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
